Edward Vytlacil
MC simulation is used broadly across economics:
Generate random draws in R;
Approximate the probability of some event by simulation;
Evaluate expected value of some function by simulation;
Next Lecture:
We will return to computationally intensive methods for inference later in the course.
Simulating one roll of a die:
sample(1:6,1) which randomly drawing an element from {1,2,3,4,5,6}, each element equally likely, one time.Computers are deterministic.
To create truly random numbers, can use a physical process, e.g., from nuclear decay of radioisotopes.
Not necessary for our purposes, and not what R is doing…
R is generating pseudo-random numbers
R is generating pseudo-random numbers
You may obtain different pseudo-random numbers each time you run your code, neither you nor others can replicate your results.
For replication purposes, can set.seed
Common distributions with built-in generators in R:
| Distribution | Function | Returns |
|---|---|---|
| Binomial | rbinom(n,size,prob) |
draw \(n\) times from \(\mbox{Binom}(\mbox{size},p).\) |
| Normal | rnorm(n,mean=0,sd=1) |
draw \(n\) times from \(N(\mbox{mean},\mbox{sd})\) |
| Student-t | rt(n,df) |
draw \(n\) times from \(t_{df}\) distribution. |
| Uniform | runif(n,min=0,max=1) |
draw \(n\) times from \(\mbox{Unif}[\mbox{min},\mbox{max}]\). |
| Poisson | rpois(n,lambda) |
draw \(n\) times from \(\mbox{Pois}(\lambda)\). |
Theorem. (Inverse Probability Transform) Let \(U \sim \text{Unif}(0,1)\) and let \(F\) be a CDF with generalized inverse \[F^{-1}(u) = \inf\{x : F(x) \ge u\}.\] Then \(X = F^{-1}(U)\) has distribution \(F\).
Theorem. (Inverse Probability Transform) Let \(U \sim \text{Unif}(0,1)\) and let \(F\) be a CDF with generalized inverse \[F^{-1}(u) = \inf\{x : F(x) \ge u\}.\] Then \(X = F^{-1}(U)\) has distribution \(F\).
Implication: To draw from any distribution with known \(F^{-1}\), draw \(U \sim \text{Uniform}(0,1)\) and compute \(F^{-1}(U)\).
Use when \(F^{-1}\) is available in closed form and no built-in generator exists.
Pareto distribution: \(F(x) = 1 - \left(\frac{x_m}{x}\right)^\alpha\) for \(x \ge x_m > 0\).
Inverting: \(F^{-1}(u) = \frac{x_m}{(1-u)^{1/\alpha}}\).
For \(B=100,000\) draws from \(\text{Pareto}(\alpha=2.5, x_m=1)\) by inverse CDF method:
Constructive (algorithmic) simulation: if you can describe the process that generates \(X\) as a sequence of steps involving simple random primitives (coin flips, uniform draws, normal draws, …), simulate \(X\) by executing that algorithm directly.
Does not require knowing the CDF of \(X\) — only requires knowing the mechanism that produces \(X\).
This is often the natural approach to simulate economic models: we specify primitives of a model (preferences, technology, shocks) and simulate outcomes.
Suppose a r.v. \(X\) is defined as a function of a (possibly random-length) sequence of simpler r.v.s: \[X = g(Z_1, Z_2, \ldots)\] where each \(Z_i\) is easy to simulate. Then to draw \(X\):
Let \(X\) = number of fair coin flips until 4 heads in a row.
\(X\) is a well-defined random variable, but its PMF has no simple closed form.
But we know exactly how to generate \(X\): flip coins and count.
[1] 9
How to draw \(B\) independent copies of \(X\)?
replicatereplicate(B, expr) evaluates expr \(B\) times and returns a vector of the results.
Using replicate with our previously defined function run_until_4H() to draw \(B\) independent copies of \(X\):
R is an interpreted language with high per-iteration overhead — scalar for/while loops can be very slow.
Vectorized operations (operating on entire vectors at once) call optimized C/Fortran code under the hood and are typically orders of magnitude faster.
user system elapsed
0.317 0.007 0.325
replicate (or sapply/vapply) for the outer loop over MC replications.Our Pareto example was fully vectorized — x_m / (1 - runif(B))^(1/alpha) applies the inverse CDF element-wise to an entire vector of uniform draws. No loop or replicate needed.
Our run_until_4H() requires a loop (random stopping time — we don’t know how many draws are needed in advance).
Not everything can be vectorized. But when the number of draws is fixed, vectorize aggressively.
The same principle applies to more complex settings:
Order statistics: draw \(Y_1,\ldots,Y_N \stackrel{i.i.d.}{\sim} F\), return \(X = Y_{(k)}\).
First passage times: simulate a random walk \(S_n = \sum_{i=1}^n Z_i\), return \(X = \inf\{n : S_n > c\}\).
Estimators and test statistics: draw a sample from a DGP, compute \(\hat{\beta}\) or a \(t\)-statistic.
Auction outcomes: draw bidder valuations from \(F\), compute equilibrium bids, return winning bid or seller revenue.
Search models: worker draws wage offers each period, accepts or continues searching according to a reservation wage rule. Simulate time to employment, accepted wage, etc.
Firm Exit: simulate profit shocks and exit decision, return firm lifespan.
We next turn to using MC to approximate probabilities and expectations.
For given distribution \(F\) and set \(A\), approximate \(\mathbb{P}_F[X \in A]\) by \[\frac{1}{B}\sum_{b=1}^B \mathbf{1}\{X_b \in A\}, \qquad X_1,\ldots,X_B \stackrel{i.i.d.}{\sim} F.\]
By WLLN, \[\frac{1}{B}\sum_{b=1}^B \mathbf{1}\{X_b \in A\} \xrightarrow{P} \mathbb{P}_F[X \in A]\qquad \text{as} \qquad B \to \infty.\]
For example, what is the probability that it takes fewer than 20 flips to get 4 heads in a row?
Let \(p = \mathbb{P}_F[X \in A]\) and \(\hat{p}_B = \frac{1}{B}\sum_{b=1}^B \mathbf{1}\{X_b \in A\}\).
Each \(\mathbf{1}\{X_b \in A\} \stackrel{i.i.d.}{\sim} \text{Bernoulli}(p)\), so \(B\,\hat{p}_B \sim \text{Binomial}(B, p)\).
Therefore:
\(\text{SD}(\hat{p}_B) = \sqrt{\frac{p(1-p)}{B}}\), the MC standard error.
Since \(p(1-p) \le 1/4\), the MC standard error is at most \(\frac{1}{2\sqrt{B}}\), regardless of \(p\).
| \(B\) | Max MC Std. Error |
|---|---|
| \(1{,}000\) | \(0.0158\) |
| \(10{,}000\) | \(0.0050\) |
| \(100{,}000\) | \(0.0016\) |
| \(1{,}000{,}000\) | \(0.0005\) |
What if we repeat the entire MC exercise \(R = 1{,}000\) times, each with \(B\) replications?
By CLT, an approximate \(95\%\) confidence interval for \(p\) based on MC simulation: \[\hat{p}_B \;\pm\; 1.96\,\sqrt{\frac{\hat{p}_B(1-\hat{p}_B)}{B}}.\]
To achieve MC accuracy of \(\pm \epsilon\) with \(95\%\) confidence, need \[B \ge \frac{(1.96)^2 \, p(1-p)}{\epsilon^2} \;\le\; \frac{(1.96)^2}{4\,\epsilon^2} \approx \frac{1}{\epsilon^2}.\]
To achieve MC accuracy of \(\pm \epsilon\) with \(95\%\) confidence requires approximately \(B \approx 1/\epsilon^2\) replications:
| Desired accuracy (\(\epsilon\)) | Required \(B\) (worst case) |
|---|---|
| \(\pm 0.01\) | \(\approx 10{,}000\) |
| \(\pm 0.001\) | \(\approx 1{,}000{,}000\) |
| \(\pm 0.0001\) | \(\approx 100{,}000{,}000\) |
Diminishing returns: each additional decimal of accuracy costs \(100\times\) more replications.
The math is identical: for i.i.d. draws, \(\text{Var}(\bar{X}_B) = \sigma^2/B\) whether the draws come from a MC simulation or from sampling a population.
The key practical difference: in statistical estimation, the sample size \(n\) is typically fixed by the data you have (or can afford to collect). In MC simulation, \(B\) is a choice variable — you can always reduce MC error by running more replications.
Implication: MC approximation error is not a fundamental limitation. If your MC standard error is too large, increase \(B\) and wait longer. The only cost is computation time.
This is why we distinguish between:
More generally, approximate \(\mathbb{E}_F[g(X)]\) by \[\frac{1}{B}\sum_{b=1}^B g(X_b), \qquad X_1,\ldots,X_B \stackrel{i.i.d.}{\sim} F.\]
Approximating \(\mathbb{P}_F[X \in A]\) is a special case with \(g(x) = \mathbf{1}\{x \in A\}\).
By WLLN, \(\frac{1}{B}\sum_{b=1}^B g(X_b) \xrightarrow{P} \mathbb{E}_F[g(X)]\) as \(B \to \infty\),
provided \(\mathbb{E}_F[|g(X)|] < \infty\).
The MC standard error is now \(\text{SD}(g(X))/\sqrt{B},\) estimated by \(\hat{\sigma}_g / \sqrt{B}\) where \[\hat{\sigma}_g^2 = \frac{1}{B-1}\sum_{b=1}^B \left(g(X_b) - \bar{g}\right)^2, \qquad \bar{g}= \frac{1}{B} \sum_{b=1}^B g(X_b) .\]
The earlier formula is the special case for \(g = \mathbf{1}_A\).
As before, accuracy improves at rate \(1/\sqrt{B}\), but now the constant depends on \(\text{Var}(g(X))\).
Using our earlier \(100{,}000\) draws of \(X\) = flips until 4 consecutive heads:
mean E_X_sq var P_lt_20
30.00 1600.00 740.00 0.46
Plotting the cumulative average \(\bar{X}_B = \frac{1}{B}\sum_{b=1}^B X_b\) as \(B\) grows:
The Cauchy distribution has density \[f(x) = \frac{1}{\pi(1 + x^2)}, \qquad x \in \mathbb{R}.\]
Symmetric around zero, looks similar to a standard normal but with much heavier tails.
Key property: \(\mathbb{E}[|X|] = \infty\), so \(\mathbb{E}[X]\) does not exist.
Across \(1{,}000\) draws of \(\bar{X}_{10{,}000}\), each based on \(10{,}000\) i.i.d. Cauchy r.v.s:
var min max
65000 -210 8000
The variance of \(\bar{X}_B\) does not shrink with \(B\).
Can show that sample average of i.i.d. Cauchy r.v.s is itself Cauchy — averaging provides no improvement whatsoever.
MC approximation of \(\mathbb{E}_F[g(X)]\) via \(\frac{1}{B}\sum_{b=1}^B g(X_b)\) is:
The Cauchy example is a cautionary tale, but also illustrates a practical point: always check whether the moments you are trying to approximate actually exist.
In practice, you may not know whether \(\mathbb{E}[|g(X)|] < \infty\). Some warning signs:
Instability across runs: re-run the MC simulation with different seeds. If \(\bar{g}_B\) varies wildly across runs even for large \(B\), suspect a moment problem.
Running mean doesn’t settle: plot \(\bar{X}_B\) as a function of \(B\). Convergence should be visually apparent; persistent drift or jumps suggest trouble.
Sensitivity to extreme draws: if removing the largest \(1\%\) of \(|g(X_b)|\) values substantially changes \(\bar{g}_B\), the tails are driving the estimate.
seeds <- 1:25
B_diag <- 100000
# Stopping time: stable across seeds
means_stop <- sapply(seeds, function(s) {
set.seed(s); mean(replicate(B_diag, run_until_4H()))
})
# Cauchy: unstable across seeds
means_cauchy <- sapply(seeds, function(s) {
set.seed(s); mean(rcauchy(B_diag))
})
df_diag <- data.frame(
seed = rep(seeds, 2),
mean = c(means_stop, means_cauchy),
dist = rep(c("Stopping Time — E[X] = 30",
"Cauchy — E[X] does not exist"), each = length(seeds))
)
ggplot(df_diag, aes(x = factor(seed), y = mean)) +
geom_point(size = 2, color = "steelblue") +
geom_hline(data = data.frame(
dist = c("Stopping Time — E[X] = 30",
"Cauchy — E[X] does not exist"),
yint = c(30, 0)),
aes(yintercept = yint), linetype = "dashed") +
facet_wrap(~ dist, scales = "free_y") +
labs(x = "Seed", y = expression(bar(X)[B]),
title = paste0("MC estimate across 25 independent runs (B = ",
formatC(B_diag, format = "d", big.mark = ","), ")")) +
theme_minimal() +
theme(axis.text.x = element_text(size = 7))set.seed(51)
B_diag <- 100000
X_stop_diag <- replicate(B_diag, run_until_4H())
X_cauchy_diag <- rcauchy(B_diag)
df_running <- data.frame(
B = rep(1:B_diag, 2),
cum_mean = c(cumsum(X_stop_diag) / seq_along(X_stop_diag),
cumsum(X_cauchy_diag) / seq_along(X_cauchy_diag)),
dist = rep(c("Stopping Time — E[X] = 30",
"Cauchy — E[X] does not exist"), each = B_diag)
)
print(ggplot(df_running, aes(x = B, y = cum_mean)) +
geom_line(color = "steelblue", linewidth = 0.4) +
geom_hline(data = data.frame(
dist = c("Stopping Time — E[X] = 30",
"Cauchy — E[X] does not exist"),
yint = c(30, 0)),
aes(yintercept = yint), linetype = "dashed", color = "red") +
facet_wrap(~ dist, scales = "free_y") +
scale_x_continuous(labels = scales::comma, expand = c(0, 0)) +
labs(x = "B (number of replications)",
y = expression(bar(X)[B]),
title = "Cumulative Average as Diagnostic") +
theme_minimal())X_c <- X_cauchy_diag
# Full sample vs. trimmed
full_mean <- mean(X_c)
trim_99 <- mean(X_c[abs(X_c) <= quantile(abs(X_c), 0.99)])
trim_95 <- mean(X_c[abs(X_c) <= quantile(abs(X_c), 0.95)])
# Same for stopping time
full_mean_s <- mean(X_stop)
trim_99_s <- mean(X_stop[X_stop <= quantile(X_stop, 0.99)])
trim_95_s <- mean(X_stop[X_stop <= quantile(X_stop, 0.95)])
df_trim <- data.frame(
Trim = rep(c("None", "Top 1%", "Top 5%"), 2),
Mean = c(full_mean, trim_99, trim_95,
full_mean_s, trim_99_s, trim_95_s),
Dist = rep(c("Cauchy", "Stopping Time"), each = 3)
)
df_trim$Trim <- factor(df_trim$Trim, levels = c("None", "Top 1%", "Top 5%"))
print(ggplot(df_trim, aes(x = Trim, y = Mean)) +
geom_col(fill = "steelblue", alpha = 0.7, width = 0.5) +
geom_hline(data = data.frame(Dist = c("Cauchy", "Stopping Time"),
yint = c(0, 30)),
aes(yintercept = yint), linetype = "dashed") +
facet_wrap(~ Dist, scales = "free_y") +
labs(x = "Observations Trimmed",
y = expression(bar(X)),
title = "Sensitivity of MC Estimate to Trimming Extreme Values") +
theme_minimal())Next Lecture: Using MC simulations to evaluate finite-sample performance of estimators and inference procedures.
Future Lectures:
Use of MC simulation in implementing randomization inference.
Use of MC simulation in implementing bootstrap inference.
Econ 5551, Spring 2026